Retry shim start without userns on clone failure#150
Retry shim start without userns on clone failure#150dmcgowan merged 1 commit intocontainerd:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR makes shim startup more resilient on Linux by retrying Start() without user/mount namespace clone flags when the initial start fails after enabling user namespaces, allowing the shim to degrade to no mount isolation instead of failing the container start.
Changes:
- Change
cloneMntNsto return a boolean indicating whether userns/mntns clone flags were applied. - On Unix, retry shim
cmd.Start()without namespace isolation when the first start fails andcloneMntNshad enabled user namespaces. - Update non-Linux stub implementation to match the new
cloneMntNssignature.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| internal/shim/manager/mount_other.go | Update non-Linux cloneMntNs stub to return bool (always false). |
| internal/shim/manager/mount_linux.go | Return whether userns/mntns clone flags were applied so callers can distinguish the path taken. |
| internal/shim/manager/manager_unix.go | Retry shim start without namespace isolation when a userns-enabled start fails; add logging for the fallback. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ccfb76d to
c010af3
Compare
c010af3 to
f68d962
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f68d962 to
f040567
Compare
cloneMntNs sets CLONE_NEWUSER|CLONE_NEWNS on the child, but clone can fail for reasons the proactive AppArmor sysctl check cannot detect — seccomp filters, other LSM policies, or EACCES when inherited socket fds cross the user namespace boundary after exec triggers capability recomputation. Return whether namespace flags were set so the caller can distinguish a namespace-related Start failure from an unrelated one. On failure, rebuild the command without clone flags and retry, degrading gracefully to no mount isolation rather than failing the container start entirely. Signed-off-by: Derek McGowan <derek@mcg.dev>
f040567 to
5db9fb1
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cloneMntNs sets CLONE_NEWUSER|CLONE_NEWNS on the child, but clone can fail for reasons the proactive AppArmor sysctl check cannot detect — seccomp filters, other LSM policies, or EACCES when inherited socket fds cross the user namespace boundary after exec triggers capability recomputation.
Return whether namespace flags were set so the caller can distinguish a namespace-related Start failure from an unrelated one. On failure, rebuild the command without clone flags and retry, degrading gracefully to no mount isolation rather than failing the container start entirely.